-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix goroutine leaks in plugin/sampling/strategystore/adaptive #5310
Fix goroutine leaks in plugin/sampling/strategystore/adaptive #5310
Conversation
This mainly involved ensuring that all goroutines started by the Processor are shut down in a Close method (which also blocks on them returning via a WaitGroup). Adding this flagged an issue where the `runUpdateProbabilitiesLoop` had a long delay, so tests need to be able to override the default Processor.followerRefreshInterval, or they take a long time to run. Signed-off-by: Will Sewell <[email protected]>
9440ab0
to
250ab88
Compare
Haven't looked at the code yet, but from the description it sounds like the exit is simply not properly implemented if you have to wait for 20sec. If there is a loop with a timer, probably blocking in select, the good approach is to add another "stop" channel that Close function can close and it would cause an exit from select. |
Ah yes, that would be better. I'll have a go at reworking it. |
Rather than having to set the "delay phase" to a low value, we instead make it possible for the `shutdown` channel to unblock the delay. Signed-off-by: Will Sewell <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5310 +/- ##
==========================================
+ Coverage 95.06% 95.12% +0.05%
==========================================
Files 340 340
Lines 16612 16640 +28
==========================================
+ Hits 15792 15828 +36
+ Misses 631 624 -7
+ Partials 189 188 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
}() | ||
defer func() { | ||
close(p.shutdown) | ||
<-done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need done
sync here? as long as goroutines are not blocked they will exit and goleak will tolerate that, it does not expect immediate clean state when it's called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for flagging this. They are not required (test pass without it). I was also misunderstanding the goleak semantics.
I think the issue I was having is before dfe7adc, these tests were failing because a goroutine could be blocked in time.Sleep.
I'll remove this.
@@ -24,6 +24,9 @@ import ( | |||
|
|||
// StrategyStore keeps track of service specific sampling strategies. | |||
type StrategyStore interface { | |||
// Close() from io.Closer stops the processor from calculating probabilities. | |||
io.Closer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added this to the interface, but I am not seeing any non-test code that's actually calling it.
Adding Closer to interface it often contentious, some people argue that if you create an object via NewX() *X
, you already have the ability to call Close on it without adding Close function to the interface that X implements. This doesn't work well when factories are involved since the factory does return an interface, not an actual struct. One other workaround to that is doing a runtime check for io.Closer
interface and only then calling close - this is why I am asking about prod code calling it.
I'm ok to keep io.Closer in the interface because both real implementations are now closable (static store used to not have close before we added file watcher to it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think that makes sense.
This doesn't work well when factories are involved since the factory does return an interface, not an actual struct.
Is there a fundamental reason why factories shouldn't return a struct instead of an interface? (Other than it being a breaking change to make in this instance).
One other workaround to that is doing a runtime check for io.Closer interface and only then calling close - this is why I am asking about prod code calling it.
Prod code is not calling Close
- do you have a preference between the current implementation vs the runtime check in tests? I don't feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a fundamental reason why factories shouldn't return a struct instead of an interface?
Yes - polymorphism. The whole point of a factory is to abstract what underlying implementation it creates, which means it always returns an interface.
Prod code is not calling Close
I actually think our pattern is that the main code only calls factory.Close() and the factory is generally responsible for releasing any resources. E.g. we don't call Close on SpanReader that we obtain from the factory.
Blocking on all goroutines fully returning is not necssary to appearse goleak. See: https://github.com/jaegertracing/jaeger/pull/5310/files/dfe7adc7fd34389e2e507d9512eb64b5e19048f7#r1544603132 Signed-off-by: Will Sewell <[email protected]>
Which problem is this PR solving?
Description of the changes
runUpdateProbabilitiesLoop
had a long delay, so tests need to be able to override the default Processor.followerRefreshInterval, orClose
would take up to ~20s to return. More context on this here Enable and enforce goroutine leak checks in tests #5006 (comment) - specifically regarding how to override this in theFactory
.How was this change tested?
make test lint
Checklist
jaeger
:make lint test
jaeger-ui
:yarn lint
andyarn test